The Goal:

In this article, we will explore the magic of Transfer Learning (TL). In particular, we will build a dataset of Disney Princesses and try to predict what Disney Princess someone most likely is. As we plan not to spend to much time, collecting Disney image data for our image set, we will use a pretrained model. In detail, we will build a base model from the MobileNet V2 model developed at Google. This is pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M images and 1000 classes. ImageNet is a research training dataset with a wide variety of categories like jackfruit and syringe. Based on this base-model, we will add our classification layer for the Disney princesses. The outcome will be a Convolutional Neural Network (CNN). Let's see how good we will do.

Image classification:

Image classification is a supervised learning problem. We define a set of target classes (in our case Disney Princesses), and train a model to recognize them using labeled example photos. In our example, we will make use of TensorFlow 2.x in order to build, train, and optimize our model.

Key components are:

Let's jump right into the Code. First, we import all required dependencies. (1) TensorFlow: is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. (2) Keras: is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. (3) Numpy: is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. (4) MatPlotLib: is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. (5) os: This module provides a portable way of using operating system dependent functionality. (6) Zipfile: The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import numpy as np
import matplotlib.pyplot as plt

import os
import zipfile

Second, we will load our data into Colab from Google Drive. In this regard, we need to mount Google Drive and authenticate ourselves in order to access data from the cloud. In my case, I stored the Disney Dataset in "/content/gdrive/My Drive/Datasets/". The file name is "". Surely you can also upload the dataset manually in Colab or use any other storage solution.

#Mount Google Drive
from google.colab import drive

#We will create a temp. directory where we gonna store the picture data
!mkdir disney4
#Go to directory (not necessary in case you specify the extraction path directly via !unzip)

#Unzip our .zip file in the directory
!unzip "/content/gdrive/My Drive/Datasets/" -d 'disney4'

#Path where we have stored our Pictures in Colab
PATH = 'disney4'

Next, we will specify the training and validation directory. This can easily be done by extracting the folder names of the dataset (note we have one train and one validation folder). We therefore make use of the path.join function to create the variables of the respective directories.

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

We then set up our variables that we will use while pre-processing the dataset and training the network. As we are only having a small dataset and try not to overfit our model right away - I recommend going for a max. of 50 epochs with a batch size of 5. In addition, we will rescale our pictures to 150x150 pixels as we plan to use a 1D [150,150] Tensor.

batch_size = 5
epochs = 50

Next step is data preparation. We will format the images into appropriately pre-processed floating point tensors before feeding to the network. Therefore, we ill decode contents of these images and convert it into proper grid format as per their RGB content. After that we will convert them into floating point tensors. Finally, we will rescale the tensors from values between 0 and 255 to values between 0 and 1, as neural networks prefer to deal with small input values. Therefore, we will use the `ImageDataGenerator` class provided by `tf.keras`. It can read images from disk and preprocess them into proper tensors. It will also set up generators that convert these images into batches of tensors—helpful when training the network.

train_image_generator = ImageDataGenerator(rescale=1./255) # Generator for our training data
validation_image_generator = ImageDataGenerator(rescale=1./255) # Generator for our validation data

After defining the generators for training and validation images, the flow_from_directory method load images from the disk, applies rescaling, and resizes the images into the required dimensions.

train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
                                                          target_size=(IMG_HEIGHT, IMG_WIDTH),
#Found 145 images belonging to 14 classes.

val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
                                                           target_size=(IMG_HEIGHT, IMG_WIDTH),
#Found 14 images belonging to 14 classes.

Let's have a look at our classes.

labels = (train_data_gen.class_indices)
labels = dict((v,k) for k,v in labels.items())

This worked perfectly. We should now see a dictionary of our classes -- the 14 Disney princesses that are in our dataset.

{0: 'Anna', 1: 'Ariel', 2: 'Aurora', 3: 'Belle', 4: 'Cats', 5: 'Cinderella', 6: 'Elsa', 7: 'Jasmine', 8: 'Merida', 9: 'Moana', 10: 'Mulan', 11: 'Rapunzel', 12: 'Snow', 13: 'Tiana'}

Now let's visualize the training images by extracting a batch of images from the training generator and then plot five of them with matplotlib. I hope you can still recognize the images after rescaling and normalizing the pixel values.

sample_training_images, _ = next(train_data_gen)

# This function will plot images in the form of a grid with 1 row and 5 columns where images are placed in each column.
def plotImages(images_arr):
    fig, axes = plt.subplots(1, 5, figsize=(20,20))
    axes = axes.flatten()
    for img, ax in zip( images_arr, axes):


See below our five sample images that are in our training dataset. Remember, we have about 145 training images + 14 validation images.

Now it is time to create our model. As mentioned above, we will leverage the pretrained MobileNetV2 model by Google for our base model, due to the fact that we are not having a vast amount of training data for our Disney Princesses. This approach is called transfer learning and is especially valuable in cases where not much data is present for training purposes.


# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,

Downloading the model can take some time as we are working with a 9+ MB model. The model comes in an HDF5 file. HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections. The HDF5 technology suite includes: A versatile data model that can represent very complex data objects and a wide variety of metadata.


Downloading data from
9412608/9406464 [==============================] - 0s 0us/step

As we do not want to retrain the base model (MobileNetV2), we gonna exclude it from the training process by setting the trainable argument to false.

base_model.trainable = False

Let's take a look at the base model architecture. Although it looks family complex -- most of the model leverages standard components such as ConvLayer, Normalization and respective activation functions to ensure non-linearity.



Let's define the feature batch shape for our layer that will sit on top of the base model. We could also call it base model output.

final_img = tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])

final_img_tfl = np.expand_dims(final_img, axis=0)

feature_batch = base_model(final_img_tfl)


(1, 150, 150, 3)
(1, 5, 5, 1280)

We will build a GlobalAveragePooling layer on top of the base model. Remember, we currently have a feature output share of (1, 5, 5, 1280). However, for classification of 14 classes, we just want to have a (1, 14) Tensor.

global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)

prediction_layer = tf.keras.layers.Dense(units = 14, input_shape = (520,), activation='softmax')
prediction_batch = prediction_layer(feature_batch_average)

model = tf.keras.Sequential([


(1, 1280)
(1, 14)

Next step is compiling our model. For our model, we choose the ADAM optimizer and categorical cross entropy loss function. To view training and validation accuracy for each training epoch, pass the metrics argument.


Let's have a look at our complete model now.


Let's hope that these 17934 trainable parameters are enough to allow us to get a decent accuracy for our Disney Princess predictions.


Model: "sequential"
Layer (type)                 Output Shape              Param #
mobilenetv2_1.00_224 (Model) (None, 5, 5, 1280)        2257984
global_average_pooling2d_1 ( (None, 1280)              0
dense (Dense)                (None, 14)                17934
Total params: 2,275,918
Trainable params: 17,934
Non-trainable params: 2,257,984

Let the training begin:

history = model.fit_generator(


Epoch 1/50
Epoch 2/50
5/5 [==============================] - 1s 146ms/step - loss: 2.5233 - accuracy: 0.2800 - val_loss: 2.5921 - val_accuracy: 0.1667
Epoch 50/50
5/5 [==============================] - 1s 158ms/step - loss: 1.8199 - accuracy: 0.9600 - val_loss: 2.1558 - val_accuracy: 0.6250

Ok, 50 Epochs of training with 96% training accuracy and 63% validation accuracy -- quite impressive with only 10 training images for each Disney Princess. Let's visualize our training and validation progress.

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']


epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')

Cool, we have build a decent model while leveraging transfer learning. Let's give it a shot and throw some Disney images at it in order to see how it is doing...

image_path = "/content/gdrive/My Drive/Datasets"

def loadImages(path):
    '''Put files into lists and return them as one list with all images
     in the folder'''
    image_file = sorted([os.path.join(path, file)
                          for file in os.listdir(path )
                          if file.endswith('.JPG')])
    return image_file

image_list = loadImages(image_path)

path = np.array(image_list)
path_string = (path[0])


img =
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
final_img = tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])

plt.subplot(121), plt.imshow(final_img)

#Expand Tensor for Model (Input shape)
y = np.expand_dims(final_img, axis=0)

#Predict Image Tensor with model
prediction = model.predict(y)
prediction_squeeze = np.squeeze(prediction, axis=0)

label_array = np.array(labels)

for key, value in labels.items():
    real_label = prediction_squeeze[key]

    print ("{0:.0%}".format(real_label), value)

91% Elsa


Yes, we did it! A cool application of Transfer Learning (using MobileNetV2) for image classification. Especially easy to handle with the build in Keras functions.

